Skip to content

[DO NOT MERGE] 0.23.1 + StackRox patches#97

Draft
Stringy wants to merge 30 commits intoupstream-mainfrom
0.23.1-stackrox-rc1
Draft

[DO NOT MERGE] 0.23.1 + StackRox patches#97
Stringy wants to merge 30 commits intoupstream-mainfrom
0.23.1-stackrox-rc1

Conversation

@Stringy
Copy link
Collaborator

@Stringy Stringy commented Feb 26, 2026

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

fremmi and others added 25 commits December 22, 2025 15:30
…ing due to integer overflow

Add validation in ppm_cmsg_nxthdr to ensure cmsg_aligned_len is at least
sizeof(ppm_cmsghdr) after alignment calculation. This prevents an infinite
loop when malformed ancillary data contains cmsg_len = 0xFFFFFFFFFFFFFFFF,
which causes integer overflow in PPM_CMSG_ALIGN macro, resulting in
cmsg_aligned_len = 0 and preventing forward progress in the loop.

Signed-off-by: Francesco Emmi <francesco.emmi@sysdig.com>
Guard against invalid `cmsg_len` values while accessing control
messages in ancillary data. This is achieved by checking there is
enough space between the current control message and the end of the
buffer to hold both the current control message and the next one.

This change sync the implementation of `ppm_cmsg_nxthdr()` with the
current glibc implementation:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/cmsg_nxthdr.c;h=0e602a16053ed6742ea1556d75de8540e49157f1;hb=170550da27f68a08589e91b541883dcc58dee640

Signed-off-by: Leonardo Di Giovanna <leonardodigiovanna1@gmail.com>
Signed-off-by: Roberto Scolaro <roberto.scolaro21@gmail.com>
Signed-off-by: irozzo-1A <iacopo@sysdig.com>
Signed-off-by: Leonardo Di Giovanna <leonardodigiovanna1@gmail.com>
Optimization for scanning, filter out those file descriptors that are
not socket fds.
Introduce the interesting_subsys set to configure which cgroup
subsystems are going to be considered in set_cgroups.
Signed-off-by: Matthew Knight <matthew.knight@sysdig.com>
Signed-off-by: Matthew Knight <matthew.knight@sysdig.com>
Signed-off-by: Afsan Hossain <84701952+mdafsanhossain@users.noreply.github.com>
Allow to log ASSERT failure instead of hard stopping on it. The change
also enables ASSERT logging in the Release mode on debug logging level
to give more information when troubleshooting.
Experiment with disabling trusted exepath to verify stackrox tests
There is an unexpected difference betweek kernels compiled with gcc and
clang. The former one doesn't support rcu attributes yet, meaning that
READ_TASK_FIELD_INFO on task->cred produces PTR_TO_BTF_ID. The latter
one does support rcu attribute, and READ_TASK_FIELD_INFO ends up with
PTR_TO_BTF_ID | MEM_RCU | MAYBE_NULL [1]. COS seems to be using clang to
compile the kernel:

    # from /boot/config
    CONFIG_CC_VERSION_TEXT="Chromium OS 16.0_pre484197_p20230405-r12 clang version 16.0.0
    (/var/tmp/portage/sys-devel/llvm-16.0_pre484197_p20230405-r12/work/llvm-16.0_pre484197_p20230405/clang 2916b99182752b1aece8cc4479d8d6a20b5e02da)"

The verifier doesn't like the
null part, and to fix it we need to break a chain of reading and verify
that the cred structure is not null.

[1]: https://lore.kernel.org/bpf/20230228040121.94253-3-alexei.starovoitov@gmail.com/
Along the way silence compiler warnings about task_cred
* Adapt the modern probe to clang 21

The code generated by clang 21 is more 'complex' and reaches
1000000 instructions on execve().

* Force casting size(r2) parameter to bpf_probe_read_user()

...otherwise, the compiler thinks that the calling convention allows
to optimize unsigned truncation, a the verifier disagrees.

* Decrease MAX_IOVCNT to satisfy the verifier on rhel
@Stringy Stringy changed the title [DO NOT MERGED] 0.23.1 + StackRox patches [DO NOT MERGE] 0.23.1 + StackRox patches Feb 27, 2026
Refactor sys_exit to use a single maps__get_capture_settings() lookup
instead of multiple inlined calls. Clang optimizes away null checks on
repeated bpf_map_lookup_elem() calls, causing the BPF verifier to
reject the program with "R0 invalid mem access 'map_value_or_null'"
on kernels older than 6.17 (RHEL 9, Ubuntu 22.04, COS, etc.).
@Stringy Stringy force-pushed the 0.23.1-stackrox-rc1 branch from 4c3bb8a to 6947a02 Compare March 10, 2026 09:51
@Stringy Stringy force-pushed the 0.23.1-stackrox-rc1 branch from ff51726 to fd1b2a7 Compare March 11, 2026 15:46
Stringy and others added 3 commits March 12, 2026 09:44
Restore upstream t1/t2 parameter split (params 15-27 in t1, 28-31 in
t2) and remove volatile qualifier from push__bytebuf's len_to_read.
Both changes were causing BPF verifier instruction limit overflow on
RHEL 9.4 kernels (5.14.0-427.x).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The COS verifier fix broke cred pointer chains using
READ_TASK_FIELD_INTO, which generates 2 CO-RE relocation branches per
call for the bpf_get_current_task_btf existence check. With 8 such
calls inlined into t1_execve_x, this added 16 CO-RE branches causing
BPF verifier state explosion on RHEL 9.4 kernels.

Switch to BPF_CORE_READ_INTO which still breaks the pointer chain (as
COS requires) but without the CO-RE branching overhead. This brings
the branch count back in line with upstream.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two fixes:

1. Mask ret with & 0xFFFF when assigning to snaplen in 12 BPF programs.
   The verifier on older kernels (RHEL 8 4.18, COS 6.6) rejects
   bpf_probe_read_user calls where the size argument (ret) could be
   negative. The mask tells the verifier the value is bounded unsigned.

2. Stub out t1_execve_x and t2_execve_x with #if 0 around their bodies.
   These tail calls are unreachable since execve_x returns early for both
   success and failure cases (ROX-31971). Their complexity exceeded the
   1M instruction verifier limit on RHEL SAP 9.4 (kernel 5.14.0-427).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants